An efficient algorithm for large-scale detection of protein families.
نویسندگان
چکیده
Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.
منابع مشابه
A TWO-STAGE METHOD FOR DAMAGE DETECTION OF LARGE-SCALE STRUCTURES
A novel two-stage algorithm for detection of damages in large-scale structures under static loads is presented. The technique utilizes the vector of response change (VRC) and sensitivities of responses with respect to the elemental damage parameters (RSEs). It is shown that VRC approximately lies in the subspace spanned by RSEs corresponding to the damaged elements. The property is leveraged in...
متن کاملCOMPUTATIONALLY EFFICIENT OPTIMUM DESIGN OF LARGE SCALE STEEL FRAMES
Computational cost of metaheuristic based optimum design algorithms grows excessively with structure size. This results in computational inefficiency of modern metaheuristic algorithms in tackling optimum design problems of large scale structural systems. This paper attempts to provide a computationally efficient optimization tool for optimum design of large scale steel frame structures to AISC...
متن کاملA Trust Region Algorithm for Solving Nonlinear Equations (RESEARCH NOTE)
This paper presents a practical and efficient method to solve large-scale nonlinear equations. The global convergence of this new trust region algorithm is verified. The algorithm is then used to solve the nonlinear equations arising in an Expanded Lagrangian Function (ELF). Numerical results for the implementation of some large-scale problems indicate that the algorithm is efficient for these ...
متن کاملA TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION
In this study, an approach for damage detection of large-scale structures is developed by employing kinetic and modal strain energies and also Heuristic Particle Swarm Optimization (HPSO) algorithm. Kinetic strain energy is employed to determine the location of structural damages. After determining the suspected damage locations, the severity of damages is obtained based on variations of modal ...
متن کاملDPML-Risk: An Efficient Algorithm for Image Registration
Targets and objects registration and tracking in a sequence of images play an important role in various areas. One of the methods in image registration is feature-based algorithm which is accomplished in two steps. The first step includes finding features of sensed and reference images. In this step, a scale space is used to reduce the sensitivity of detected features to the scale changes. Afterw...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic acids research
دوره 30 7 شماره
صفحات -
تاریخ انتشار 2002